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Abstract 

Mathematical models use information from past observations to generate predic- 
tions about the future. If two models make identical predictions the one that needs 
less information from the past to do this is preferred. It is already known that certain 
classical models (certain Hidden Markov Models called e-machines which are often 
optimal classical models) are not in general the preferred ones. We extend this result 
and show that even optimal classical models (models with minimal internal entropy) 
in general are not the best possible models (called ideal models) . Instead of optimal 
classical models we can construct quantum models which are significantly better but 
not yet the best possible ones (i.e. they have a strictly smaller internal entropy). In 
this paper we show conditions when the internal entropies between classical models 
and specific quantum models coincide. Furthermore it turns out that this situation 
appears very rarely. An example shows that our results hold only for the specific 
quantum model construction and in general not for alternative constructions. Fur- 
thermore another example shows that classical models with minimal internal entropy 
need not to be related to quantum models with minimal internal entropy. 

1 Introduction 

Mathematical modeling of natural and technological systems plays an important role in 
modern science. In general, there are many ways to model a system mathematically. 
One possibility is to view the system of interest as an information processing black box 
generating an observable output from given past observations. The observed data can 
be treated as a stochastic process and we try to find models which are called Hidden 
Markov Models, that generate the same statistical behaviour and that are denoted as 
classical models. We prefer models which predict future data from past observations in 
an optimal way, i.e. they need as little memory as possible to do this. The amount of 
information the past contains about the future is measured by the mutual information 
between past and future data. This quantity is known as excess entropy [Cru83]. A model 
that should be able to predict future data in an optimal way has at least to store this 
amount of information to do this. One method to construct such a model in a systematic 
way is used in computational mechanics and called (classical) e-machine. e-machines 
are the optimal classical models for a certain subset in the set of all possible alternative 
Hidden Markov Models but not the optimal classical models in general. The optimality 
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of a classical model is quantified by the classical internal state entropy of the model. 
Usually this is the Shannon entropy and for an optimal classical model the internal state 
entropy is called generative complexity Cqi- Instead of considering classical models one can 
think about analog quantum models (called Hidden Quantum Markov Models). Recent 
results show that if the classical e-machine is not already the best possible model (called 
ideal model), it is always possible to find a quantum model that needs less memory than 
the classical e-machine to reconstruct the statistical behaviour of the stochastic process 
[Gu Wll] . Usually the internal state entropy C q of the quantum model is strictly greater 
than the excess entropy E and there remains room for improvement. We extend this 
results for all optimal classical models. 

The Hidden Quantum Markov Model induced from a classical Hidden Markov Model, 
can be formulated in the setting of a quantum channel. The initial distribution and the 
transition probabilities of a classical Hidden Markov Model (Definition [1]) can be used 
to calculate the mutual information I(X; Y) between a specific classical input random 
variables X and a classical output random variables Y related to the classical model. We 
achieve the following inequality chain in the subsequent sections 

E<I(X;Y) <C q <C C i. 

In this paper we investigate for a specific quantum model construction equality condi- 
tions for the last two inequalities above. We will see that in general there remains a gap 
between the different internal state entropies for the suggested quantum model construc- 
tion introduced in [GuWll] and that the last two inequalities are strict in most cases. 
Furthermore for e-machines we prove that E = I(X; Y) hold and show with an example 
that a quantum model induced by a minimal classical model is not the minimal quantum 
model. The relationship between minimal classical models and minimal quantum models 
remains an open question. 

This paper is organized as follows. In Section [2] some basic notations and defini- 
tions are introduced. Section [3] introduces e-machines, restates a recently proved theorem 
and extends this theorem to minimal Hidden Markov Models. Section [J] introduces Hid- 
den Quantum Markov Models. Furthermore two well-known propositions applied to our 
context are presented and we generalize a further theorem from e-machines to minimal 
Hidden Markov Models. The example which shows that minimal classical models do not 
correspond to minimal quantum models is also presented here. In Section [5] we prove the 
equality conditions for the internal entropies and in Section O we present a calculation 
example and verify the proven results. Section [7] describes an alternative construction of 
a quantum model to model a stochastic process and shows that the equality conditions in 
Section [5] in general cannot be extended to other quantum model constructions than the 
suggested one in Section [H 

2 Preliminaries 

Let (O, J-, P) be a probability space with a metric space 0, a c-algebra J- and a probability 
measure P. For random variables X, Y : f2 — )■ S mapping to a finite alphabet £ the 
Shannon entropy is defined by 

H(X) := - P(X = x) log P(X = x), 
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and the conditioned Shannon entropy by 



H(X\Y) := - P(X = x,Y = y) log P(X = x\Y = y), 

where P(X = x) := P ({w G £1\X(uj) = x}) denotes the probability that the random 
variable X is equal to x G S, P(X = x, Y = y) is the joint probability between X and 
y and for P(Y = y) > the conditional probability is P(X = x\Y = y) : = ^ppyjr^y^ ■ 
In the definitions the convention 01og(0) = is used. Given a distribution /x of a random 
variable X we sometimes write H(fj.) instead of H(X). The mutual information between 
two random variables is 

I(X;Y) :=H{X)-H(X\Y). 

The mutual information is non negative (I(X;Y) > 0) and equals zero if and only if X 
and Y are independent random variables |Cov06j . 

We consider a time-discrete stationary stochastic process X := (X t )tez with random 
variables Xt : Q — > S for all i G Z. We define the semi-infinite processes A := (X_()^ e N 
interpreted as past and A := (Xt)t 6 N interpreted as future respectively. Blocks of random 
variables with finite length are denoted by X h a := (Xk)ke[ a ,b]nz f° r ~°° < a < b < co. The 
one-sided sequence space is S N := x ie pjS and in the same way the two-sided sequence 
space S z is defined. We introduce the shift function a : S z — > S z by cr(x)i := Sj+i. At 
any time t G Z we have random variables := (Xk)k<t and := (-X"fc)jfe>t+i that 

govern the systems observed behaviour respectively in the shifted past and the shifted 
future. The mutual information between these two variables is the well-known excess 
entropy [Cru831 ICru03| 

E:= lim IiX^lXZl). (1) 

L— >oo 

In general, it is not clear if the limit in (pQ) exists. We will see later that in the setting of 
this paper E always exists. With the assumption that the limit in ([1]) exists as a finite 
number the following equality holds: E = I(X;X), see Chapter 2.2 in |Pin64j . 

The stochastic process generates a sequence of output symbols which represents the 
observed behaviour of a system for which we construct a mathematical model in a dis- 
cretized fashion. 

We use a Hidden Markov Model (HMM) to model a given stochastic process. In 
general there are different kinds of HMMs. For our purpose we use a transition-emitting 
HMM and use the same terminology as in [LoelO} ILoe09al ILoe 09b] . 

Definition 1 With V{A) we denote the space of all probability measures on a set A. A 
transition-emitting HMM consists of a set S of internal states and a pair (T, //) 
with an initial distribution fi G V(S) and a measurable function T : S — > V(S x £), 
called generator. We say that (T,fi) is an HMM of X if the output- distribution which 
is determined by the output kernel K s (.) := T(s)(S x .), s G S of the HMM coincide with 
the distribution of X . 

In the following we abbreviate transition-emitting HMM with HMM. Since we are 
considering stationary stochastic processes we require that the HMM is invariant in the 
following sense. 
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Definition 2 A HMM (T, /i) is invariant, if \i is T -invariant i.e. 

fi(G) = J T(s)(GxT,)dn(s), VGe5. 

We are interested in HMMs with minimal internal state entropy H(fi) which can be 
considered as a complexity measure of the process generated by the HMM. Following Lohr 
[Loc09c , ILoelOj we define the generative complexity. 

Definition 3 The (classical) generative complexity of a stationary stochastic process 
X* is the infimum of the entropies of internal states 



Cci '■= inf < H(u) | (T, fx) is an invariant HMM of 




Lohr showed that for every stationary stochastic process there exists an invariant 
HMM (T, y) such that 

H(p) = Cci 

hold and the infimum in Definition [3] is actually a minimum (Corollary 4.14 in |LoelOj ). 
In the following we denote this invariant HMM as minimal HMM. 

The generative complexity is an upper bound for the excess entropy |LoelOj 

E<C C i. (2) 

In this paper we only consider processes which can be modeled by a minimal HMM 
with finitely many internal states. Markov processes of finite order are examples for 
processes with a finite set of internal states. Assuming finitely many internal states S = 
{Si, . . . , S n }, we can write the initial distribution as a probability vector fj, := (pi)f =1 and 
the generator as a set of substochastic nxn matrices with entries := T(Si)(Sj,r) 
for all r G S. Since we are considering only a finite set of internal states, Cci is always 
finite and with ^ the excess entropy ([TJ is also finite. 



3 e-Machines and minimal HMMs 

The following construction of a transition-emitting HMM is often regarded in the literature 
and the resulting HMM coincide in many cases with a minimal HMM. Unfortunately not 
in any case this construction leads to a minimal HMM as often wrongly claimed in the 
literature (see |LoelCH ILoe09bl ILoe09c| for counterexamples) . On the set E N of all past 
trajectories of the process X we define an equivalence relation [ShaOlj 

x nu x ' : <=^ P(X* G ^jA 7 = x) = P(X* G ~£\X~ = x ), G ~t , (3) 

where x,x' G S N , C is the product cr-algebra generated by cylinder sets on S N and 
P0t G = x) is a regular versior|_| of the conditional expectation. The equivalence 

classes 

S(x) := {x' G £V ~ x} 

of relation ([3]) are called causal states and are the internal states of the constructed HMM. 
The set of all causal states is denoted by S := {S(x)\x G S N } and is measurable (Lemma 

= x) is called a regular version if it is a Markov kernel. 
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3.18 in [LoelO] ). In general there can be uncountably many causal states [Cru94} ILoelOt 
Loc09bJ and the causal states depend on the version of conditional probability used in the 
definition |LoelOj . We say that the number of causal states is finite if there exists a version 
of conditional probability such that there are only finitely many equivalence classes. A 
characteristic property of causal states is that they induce a minimal sufficient memorjU. 

We are only considering stationary stochastic processes with a finite set of causal 
states S = {Si, . . . , S n }. Given a past observation of infinite length G S z at time 
i 6 Z using stationarity we identify this shifted past with a causal state S(a~ (xL^,)) G 
S. Together with the next symbol xt+i generated by the process the next causal state 
S(a~ t ~ 2 (x i L OQ xt+i)) G S is uniquely determined and the causal states are Markov [ShaOH 
ILoelOj . We define the Markov kernels between two causal states Si ,Sj G S emitting an 
output symbol r G £ for any t G Z as follows 

T# := nS^S^r) 

= P (S(a- t - 2 (^_ 0O x t+1 )) = SjSDAXt+x = r | S^'- V-oJ) = #) • 

The probability of a causal state S{ G S is denoted by pi := P(Si). The ordered pair 
(T, (pi, . . . ,p n )) is called e-machine. The e-machine is a transition-emitting HMM and a 
model for the original stochastic process [LoelCH Loe09aJ. 

Remark 1 In general the e-machine is not the HMM with minimal number of internal 
states and also not the one with minimal classical internal state entropy. To be precise 
Lohr proved in JLoelCfy that for a countable alphabet £ the e-machine is the minimal 
partially deterministic HMM^ of the process X . 



The e-machine has classical internal state entropy 

n 

C e := H(S) = -^Pjlogpj, 

3=1 



which is also known as statistical complexity [Gra86, ShaOl]. Since the generative com- 
plexity is an upper bound for the excess entropy, the statistical complexity is also an upper 
bound for the excess entropy [ShaOlj ICru03j 

E < C e . (4) 

The next theorem gives a characterization when is strict. 

Theorem 1 Given a stationary stochastic process X with excess entropy E and statistical 

(r) 

complexity C e . Let its corresponding e-machine have transition probabilities T i ■ . Then 
C e > E if and only if there exists a non-zero probability that two different causal states 
Sj and Sk will both make a transition to a coinciding causal state Si upon emission of a 
coinciding output r G X, i.e. T^ ,T^J ^ 0. 



2 A memory kernel is a Markov kernel 7 : E N — >• V(S). The associated random variable M is called 
memory variable or simply memory. A memory variable is called sufficient if P(X G A, X G B\M) = 
P(x G A\M)P{X G B\M) a.s. for all measureable sets A,B. A memory is minimal if every other 
sufficient memory has at least the same number of internal states and the corresponding memory variable 
has at least the same entropy, Corollary 3.21 in |LoelO| . 

3 An invariant HMM (T, /x) with measureable spaces (£, T>) and (<S, Q) is called partially deterministic 
if there is a measureable function / : S x E — >• S (transition function), such that for //-almost all s G S we 
have T(s)(G x D) = K 3 (D n f(s, .) _1 (G)) VBeB.GeS, where K s (.) := T(s)(S x .) is the output 
kernel. 
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Proof. Theorem 1 in [GuWllj . □ 

As a next step we extend the last theorem from e-machines to minimal HMMs. We 
want now return to the general case and consider minimal HMMs which we denote as 
minimal classical models. Prom the definitions of the internal entropies it is clear that 

E < Cci < C e . (5) 

There exists examples such that Cci < C e holds and it is known that [LoelOj 

C C i < C e E < Cci, 

or the negation of this 

E = C C i^ Cci = C e . (6) 
With this fact it is possible to generalize Theorem [TJ 

Theorem 2 Given a stationary stochastic process X with excess entropy E and gener- 
ative complexity Cci- Let its corresponding minimal HMM have transition probabilities 

(r) 

T>J . Then Cci > E if and only if there exists a non-zero probability that two different 
internal states Sj and Sk will both make a transition to a coinciding internal state Si upon 
emission of a coinciding output r G E, i.e. T^ ,T^ ^ 0. 

Proof. With © and © we get E = C C i C C i = C t . With Theorem Q] and the 

negation of the last expression we yield the result. □ 

Remark 2 Theorem shows that there is a kind of redundance in the minimal HMM 
producing the gap between E and Cci- This redundance is an indicator for a possible 
improvement of the classical minimal HMM, see Theorem 



4 Hidden Quantum Markov Models and Holevo-Bound 

Based on the classical minimal HMM introduced in Section [2] it is possible to define 
quantum models with the same statistical behaviour. In the spirit of classical HMM we 
define a quantum version of such models introduced in [Monllj to reproduce a given 
stochastic process. 

Definition 4 ([Mo nll] ) A quantum operation JC r : Mat(d, C) — > Mat(d, C) is a 
completely positive, trace non-increasing linear map on the space of complex d x d-matrices 
Mat(d, C). A Hidden Quantum Markov Model (HQMM) is a density matrix p G 
Mat(cf, C) together with a set of quantum operations KL r , Vr G E such that ^ rgE /C r is 
trace-preserving. At every time step a symbol r G E is generated with probability P(r) = 
Tr(XVp) and the state vector is updated to p r = fC r p/P(r). 

There is an analogy between classical HMM and HQMM, for example the quantum 
operation /C r plays the role of a substochastic matrix T^ and the density matrix corre- 
sponds to the probability vector (pi, . . . ,p n ), see [Monll] for more details. Furthermore it 
can be proved that for every transition-emitting HMM it is possible to construct a HQMM 
with the same statistical behaviour, i.e. the HQMM generates the same stochastic process 
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[Monllj . This constructed HQMM is in general not unique and there are many possibili- 
ties to construct a HQMM producing the same stochastic process. In this paper we only 
consider constructions of HQMMs based on a given classical HMM. Before we write down 
such an explicit construction we will formulate the HQMM in the setting of a quantum 
channel. For this we introduce the general setting of a quantum channel. 

Consider a finite input alphabet X and a finite output alphabet y. Further let H and 
J be the input and output Hilbert spaces. We want to transmit classical input data via a 
quantum channel that is, a completely positive, trace preserving map £ : B(H) — > B(J'), 
where B(H) is the algebra of bounded operators acting on H. In order to do this choose 
an input random variable X with values in X and with a corresponding distribution 
p : X — > [0,1]. Code each x G X in a quantum state p x G B(H) and after sending this 
through a quantum channel one can measure the output quantum state to get classical 
data as output. For every y G y there is a completely positive operator K, y G B(J) such 
that J2y£yK-y = where Ij denotes the identity operator on J. With Trj- we denote 
the partial trace with respect to J . The probability that y G y is the output symbol, 
given x £ X as input is 

T y , x ■= Tij(£(p x )fC y ), 
and the output distribution takes the form 

Py := E Tr j(Px £ (Px)JCy), for every y G y. 

The corresponding random variable with distribution {py) y ^y and values in y is denoted 
by Y. 

We now give an explicit construction of a HQMM given a HMM which was defined in 
[GuWllj . Without loss of generality let the finite alphabet be defined as S := {1, . . . , M}. 
Given a classical HMM (T, (pi, ■ ■ ■ ,p n )), with internal states S = {Si, . . . , S n }. Choose 
as an input alphabet X := {1, . . . , n} and an output alphabet y := {1, . . . , n} x S. The 
Hilbert space takes the form % := <C nM = J and the quantum channel is defined as the 
identity £ := Idsm)- We code every i G X with quantum internal states as follows 

n 

= E E V T S\j) ® k> e «, Vi G {1, . . . , n}. (7) 

r gE j=l 

The corresponding density matrix is defined as pi := \Si)(Si\. The HQMM takes the 
form p := Yl?=iPiPi an d is equipped with quantum operations JC r> j := P\j\^\ r \ which 
are projections on the space spanned by \j) \r). Clearly X^reSje{i n}^r,j is trace- 
preserving. Consider \Si) as an initial quantum internal state then with the projections 
JC r j it follows that 

Tj r ,i = Tr(£(pi)K. r j) = T-j, 

holds. We set xq = r as output and prepare the next quantum internal state \Sj). 
Repeating this procedure we get a sequence of symbols Xq, xi, ■ ■ ■ with the same probability 
as produced with the classical HMM initialized in a state Si. This proves that this HQMM 
have the same statistical behaviour as the classical HMM, which means that boths models 
have the same transition probabilities between equivalent states. 
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Remark 3 In [GuWll] this construction is applied to classical e-machines and the HQMM 
is called quantum e-machine. Since we are considering minimal HMMs which need not to 
be e-machines we call the defined HQMM a quantum model induced by a minimal HMM. 

The quantum internal state entropy of a HQMM is the von Neumann entropy 

C q := S(p) := -Trplogp. 

C q is the quantum version of the classical internal state entropy H(p) and is bounded by 
this internal state entropy and especially by the generative complexity Cqi- 

Proposition 1 Suppose p = Y^rj=\PjPj where p = (pjYj = i is a probability vector with 
Y^=iPj = 1 an d th e Pj := are density operators for every j £ {1, ... ,n}. Then 

C q <H(p), 

with equality if and only if the quantum internal states \Sj) are mutually orthogonal. In 
especially given a minimal HMM (T,p) with an induced quantum model p we have 

C q < Cci- 

Proof. Theorem 11.10 in [NieOO] or alternatively an adaption of Theorem 3.7 in [Pet08j. 

□ 

Remark 4 In the case that the classical minimal HMM coincide with the classical e- 
machine it is not clear if the quantum internal states of the induced quantum model share 
the same properties as the classical causal states, i.e. the question if quantum internal 
states are minimal sufficient in the sense of quantum mechanics is not yet answered. 

The next proposition is the well-known Holevo-Bound and gives an upper bound for 
the mutual information between classical input and classical output data. 

Proposition 2 (Holevo-Bound) Given the setting above with classical input random 
variable X and classical output random variable Y , the following bound holds 

n 

I(X;Y)<S(p)-J2PiS(Pi), (8) 

i=l 

where p = Y^i=\PiPi an< ^ with equality if and only if all p( commute. 

Proof. Theorem 12.1 in [NieOO] or Theorem 7.3 in |Pet08]. For the equality condition see 
for example [Rus02| . □ 

In the case that the HMM is an e-machine the lefthand side of (JSJ) is the excess entropy. 

Proposition 3 Let (T, (pi, . . . ,p n )) be an e-machine then given the setting above it holds 
that 

I{X ] Y) = I{X~;1) = E. 
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Proof. To prove the proposition we use a four variable mutual information introduced in 
[Ycu91] and follow the same strategy as in |CrulO| . For random variables X,Y,Z,U we 
define 

I(X;Y;Z;U) := I(X;Y; Z) - I(X;Y; Z\U), 
I(X;Y;Z) := I(X; Y) — I(X; Y\Z), 

with I(X; Y\Z) := H(X\Z) - H(X\Y, Z), 
I(X;Y;Z\U) := I(X;Y\U) - I(X;Y\Z;U), 

with I(X; Y\Z; U) := H(X\Z, U) - H(X\Z, U, Y). 

Furthermore we use the following two identities which hold for a measurable function / 
of a random- variable X ([Grail], Lemma 3.12) 

H(f(X)\X) = Q, H(XJ(X)) = H(X). (9) 

We define mappings g : S N — > X, with g(a) := j if a G Sj and / : £~ N ° — > y, with 
f(aao) := (i, do) if cr G Si- Since we are considering e-machines g and / are well-defined 
and measurable. Thus we can write X = g(X*),Y = f(X~) and using ([9]) we get 

H(Y\X~) = 0, H{X\~£) = Q, (10) 

H(X~,Y) = H(X), H{1,X) = H(1), (11) 

H(X*\X~,Y) = H(jt\Y), H{X\1,X) = H{X\X). (12) 

In the next step we show l(x*; A 7 ; X; Y) = l(j£; A 7 ) = E. Consider 

l(X*;X~;X\Y) = /(A 7 ; X~\Y) - X~\X; Y), (13) 

then the first term disappear because with (fT2|) it holds 

/(A 7 ; $ \Y) = H(X*\Y) - H(x*\X~, Y) ® . 
The second term of (|13p is also zero, since 

» t~, \ , JT2J 

I(X;X\X;Y) = H(X\X, Y) - H(X\X, Y, X) = 0. 

Putting all together we yield 

l(X*;X~;X\Y) = 0. 

Furthermore we have 

/(A 7 ; A 7 ; A) = I(X*;X~) - I(X*;X~\X) =I(X*;X~), 

since /(A 7 ; A^A) = H0C\X) - H0C\]t,X) = 0. Finally we get 

l(X*;X~;X;Y)=I(l;X~). 

In a second step we show /( ;X;Y) = I(X;Y). As in the first step the following 
term vanish 

I(X;Y; A^IA 7 ) = I(X;Y\X~) - I(X; Y\^; A 7 ) = 0, (14) 
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since I(X; Y\X~) = H{Y\X~) - H(Y\X, X~) = and 

I(X; Y\l; % = H(X\2, X~) - H{X\Y, jt, % ™ 0. 

Consider now 

I(X;Y;1)=I{X;Y)-I{X-Y\1), 
then the second term disappear, since 

I(X; Y0t) = H(X\t) - H(X\Y, 1) =P 0. 

Thus we yield 

I{1-%X\Y) = I(X-Y), 

and finally we get 

E = l(X*;X~)=I(X;Y). 

□ 

The converse of Proposition [3] is not true as can be seen in the example treated in 
Section [6l 

Remark 5 In general it is difficult to calculate the excess entropy of a given stationary 
stochastic process. If one has given an e-machine for a process it is easy to calculate 
I(X;Y) which coincide with the excess entropy E. Compared to the method in JE1109] 
which uses the structure of the e-machine, this is an alternative method to calculate E. 

Since I(X;Y) depends on the classical HMM we sometimes write Ihmm(X;Y) if a 
distinction is necessary. For general HMMs and especially for minimal HMMs which are 
not an e-machine the excess entropy is in general smaller than I(X; Y) as the next example 
shows. This example can be found in |Loe09c| . 

Example 1 Let S := {0, 1} and consider a stationary Markov process generated by the 
e-machine (T, (po,pi)) with po = p\ = \ and 

T (0)_( 5(1 + \ Ul-e) \ 

U(l-e) j' [0 |(l + e ) )> 

where < e < 1. The statistical complexity is C e = 1 for e > and the excess entropy 
amounts to 

E = i ((1 + e) log(l + e) + (1 - e) log(l - e)) , 

and coincide with lMarkov(X; Y). We give now a HMM which generates the same process 
(see ]Loe09c^ ), but with three internal states and smaller internal state entropy than C € . 
LetS := {0,1,2} with 

/ e 1-e \ / \ 

T(°) = , = o e 1 - e , 

\P T/ \0 3 ¥ / 
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and initial distribution (po,pi,P2) 

f f, ifzG{0,l} 
P< ~ \ 1 - e, if * = 2 

T/ie internal state entropy of this HMM is given by 

ff(p) = -(l-e)log(l-e)-elog(|). 
It is easy to calculate the lefthand side of the Holevo-Bound 

I 3state (X;Y) = e. 

For e e (0, 1) the excess entropy is always strictly smaller than Instate 

(X;Y). Especially 

for e small enough the three state HMM has smaller internal state entropy H(p) than the 
e-machine as can be seen in Figure QJ. 




0.2 0.4 0.6 0.8 1.0 



Figure 1: Excess entropy E, hstate '•= hstate(X;Y), C^ state , hiarkov ■= lMarkov(X;Y), 
q Markov ^ i n t erna i state entropy H(p) of the three state HMM described in Example [JJ and 
statistical complexity of the e-machine. 



Furthermore Lohr showed in \Loe09c^ that the internal state entropy of the minimal 
HMM is bounded from below by 

Cci > -(1 - e/2)log(l - e/2) - e/21og(e/2), 

where the lower bound coincide with the internal state entropy Cq State of the quantum 
model induced by the three state HMM. This example shows that it is possible that the 
excess entropy is smaller than the lower-bound I 3s tate{^\ Y) of C^ state given by the Holevo- 
Bound. Furthermore it also shows that even if the three state HMM has smaller internal 
entropy for sufficient small e, the internal state entropy C^ larkov of the quantum model 
induced by the markov model is strictly smaller than Cg State and especially smaller than 
l3state(X;Y), see FigureUl So it is not clear at all how minimal classical models and 
minimal quantum models are related to each other. 

Since the states pi = \Si)(Si\ are pure, we have S(pi) = so that Proposition [1] and 
Proposition [2] imply that in general 

E<I(X-Y)<C q <C C u (15) 

holds. 
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Remark 6 Inequality U5\) allows us to compare the information stored in a classical 
minimal HMM and a induced quantum model which generate the same stochastic process. 
In order to compare the quantum internal state entropies of different HQMM constructions 
with the internal state entropy of a given classical minimal HMM we have to ensure that 
\15\) hold. Considering the right hand side of (0j the second term has to vanish and the 
internal states of such a HQMM has to fulfill S{pi) = 0, \/ i G {1, . . . , n}. 

Gu et al. proved in [GuWllJ a remarkable theorem for classical e-machines that shows 
that if C e > E holds then the induced quantum model (JT]) has internal state entropy 
strictly smaller than the internal state entropy of the classical e- machine C q < C t . We 
extend this result to classical minimal HMMs. 

Theorem 3 Given a stationary stochastic process X with excess entropy E and gener- 
ative complexity Cci and Cci > E. Then there exists a quantum system that exhibits 
identical statistics with internal state entropy C q < Cci- 

Proof. Use Theorem [2] instead of Theorem [1] in the proof of Theorem 2 in [GuWll] . □ 

In the next section we investigate equality conditions for these different internal state 
entropies. 



5 Equality conditions 

The next two propositions deliver a characterization when equality in the last two inequal- 
ities of (EE5D holds. 



Proposition 4 Given a stationary stochastic process with excess entropy E and gen- 
erative complexity Cci- Let the corresponding induced quantum model defined in ^ have 
quantum internal state entropy C q . Then it holds that E = I(X;Y) = C q = Cci if and 
only if all quantum internal states are mutually orthogonal. 

Proof. "=>": It holds that E = I(X;Y) = C q = C C i- Theorem [2] gives us that for each 
output r G E, each index I G {1, . . . , n} and each pair of indices j 7^ k it holds that one of 

(r) (r) 

the transition probabilities T ; l , ( is zero. With the definition of the quantum internal 
states ([7]) this implies (Sj\Sk) = for all indices j 7= k. 

"<^=": The definition of the scalar product and (Sj\Sk) = for all indices j ^ k imply 
that one of , T^J is zero for each output r G S, index I G {l,...,n} and pair of 
indices j 7^ k. Again with Theorem [2] we get E = Cci- Together with (|15|) it follows that 
E = I(X;Y) = C q = C C i. □ 

Proposition 5 Given a stationary stochastic process X with excess entropy E. For a 
given classical HMM generating X with internal state entropy H(fi) let the corresponding 
induced quantum model defined in ^ have quantum internal state entropy C q . Then it 
holds that E < I(X;Y) = C q < H(fi) if and only if there exist at least two quantum 
internal states which are identical and all other quantum internal states are mutually 
orthogonal or also identical (i.e. 3 k 7^ i : (Sk\Si) = 1, (Si\Sj) is or 1 for all other 
indices I 7^ j). 
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Proof. "=>": Since C q < H{p) it follows from Proposition [T] that not all quantum internal 
states are mutually orthogonal, i.e. there exist at least one pair of indices i ^ k such 
that (Si\Sk) 7^ 0. Furthermore Proposition [2] implies that I{X;Y) = C q if and only if all 
density operators pi = |5j)(iSj| commute. It is easy to prove that all pi commute if and 
only if (Si\Sk) = or (S^S^) = 1 for all indices i,k G {1, . . . ,n}. From this equivalence 
relation the claim follows. 

"<^=": There exist at least one pair of indices i ^ k such that {Si\Sk) = 1. Together with 

the definition of quantum internal states there is an r G £ and an index I £ {1, ... ,n} such 

(r) (r) 

that T^l ^ and TK J ^ 0. Since not all quantum internal states are mutually orthogonal 
it follows from Proposition [1] that C q < H{p). From the Holevo-Bound (Proposition [2]) 
we know that I(X; Y) < C q with equality if and only if all density operators pi = \Si)(Si\ 
commute which is again equivalent to the condition that (Si\Sk) = 1 or (Si\Sk) = for 
all indices i, k G {1, . . . , n}. Hence I(X; Y) = C q follows. □ 

A direct consequence of Proposition [5] is that if E < I(X; Y) = C q < H(p) there 
exist two identical quantum internal states (Si\ = (Sk\, i ^ k. This implies that for all 
r£S and all indices / G {1, .. . ,n} it holds that = . Which means that in the 
corresponding classical HMM there are two states which are redundant and can be merged 
to one state. This HMM is not a classical minimal HMM for the underlying stochastic 
process as the next proposition shows. 

Proposition 6 Given a stationary stochastic process X* with excess entropy E. For a 
given classical HMM generating X* with internal state entropy H(p) let the corresponding 
induced quantum model defined in ^ have quantum internal state entropy C q . The clas- 
sical HMM corresponding to the induced quantum model in the case E < I(X; Y) = C q < 
H(p) is not a classical minimal HMM and therefore has not minimal classical internal 
state entropy. 

Proof. Suppose that the classical HMM corresponding to the induced quantum model 
is a minimal HMM (i.e. H(p) = Cci), then one can remove all redundant states in this 
classical HMM and in the resulting induced quantum model there remains only orthogonal 
quantum internal states. With Proposition U] we have E = I{X;Y) = C q = Cci and the 
reduced classical HMM is in fact the minimal HMM which is an ideal model. So the 
not reduced classical HMM cannot be the minimal HMM which is a contradiction to the 
assumption and the claim is proved. □ 

The last proposition implies that the case E < I(X; Y) = C q < Cci cannot exist. 

Remark 7 The case E < I(X; Y) < C q = Cci does not exist. Suppose this case exists. 
Then Proposition [1\ would imply that all quantum internal states are mutually orthogonal 
and Proposition [7] implies E = I(X; Y) = C q = Cci which is a contradiction to the 
assumption. 

That is given a minimal classical HMM one is either in the case that the classical HMM 
is as good as the induced quantum model or the induced quantum model has a quantum 
internal state entropy C q strictly smaller than Cci and strictly greater than I(X; Y). We 
summarize the different cases: 

(i) E = I(X; Y) = C q = Cci <^=^ the classical HMM and the induced quantum model 
are both optimal and all quantum internal states are mutually orthogonal. 
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(ii) E < I(X; Y) = C q < C C i is not possible. 

(iii) E < I(X; Y) = C q < H(p) <s=> the corresponding classical model contains redun- 
dant states and is not a minimal HMM and the induced quantum model contains 
only orthogonal or identical states but at least two identical states. 

(iv) E < I(X; Y) <C q = C C i is not possible. 

(v) E < I(X;Y) < C q < Cci ^=>- the classical HMM can be optimal and there exists 
quantum internal states which are not orthogonal and not identical. 

So if one chooses an optimal classical HMM which is not an ideal classical model, there 
is always an induced quantum model which is nearer to an ideal model but never achieve 
such an ideal model. 



6 Calculation Example 

The following example illustrates the propositions shown in the preceding sections. We 
consider the Random Noisy Copy HMM (RnC) [E1109]. This HMM generates a binary 
stochastic output process. It is given by a binary alphabet £ = {0, 1}, the internal states 
S = {A, B,C} (which are also the causal states) and the Markov kernels 

/ p \ / 1-p \ 

T (0) =10 0, T (1) = , 
\? 0/ \ 1-q 0/ 

with < p, q < 1. Figure [2] (a) shows a graphical representation of the RnC HMM. 




(a) (b) 

Figure 2: (a) Minimal HMM for the RnC process. Nodes denoting the internal states of 

(x) 

the HMM and edges labels t\x give the probability t = L of making a transition from 
S to S' and seeing symbol x. (b) Minimal HMM for the underlying process in the case 
q = l. 



The RnC HMM coincide with the classical e-machine. The left eigenvector of the 
stochastic matrix + gives us the stationary distribution over the internal states 

P(S) = i( 1 P 1-p). 
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This allows us to calculate the generative complexity (which is identical with the statistical 
complexity) 

H(p) 



a 



ci 



1 + 



where H(p) = — plog(p) — (1 — p)log(l — p) is the binary entropy function. In this 
section logarithm is taken to the base 2. With more sophisticated techniques (see (E1109 
for calculation details) or with Proposition [5] one can also calculate the excess entropy 
directly 

E = I(X;Y) = l + ^- P + q ^- p) H^ P 
The quantum internal states defined in Q are 



1^4) 



p + q(l - p) J ' 



( ^ 




( l \ 




( Vs \ 


Vp 




, \B) = 






. \c) = 




VI ~Q 


V vi -p J 




V o y 







The eigenvalues of p = \ (|5o)(5o| +p\Si)(Si\ + (1 - p)|5 , 2 )(5' 2 |) are 
|i j(l ± y/l - Ap + V + Apq - 4p 2 q) J . 
Setting r/(x) := — xlog(x) the internal entropy of the induced quantum model amounts to 

Cq = V Q) + ^ Q ( 1 + V 7 ! - 4 P + 4 P 2 + 4pg - Ap 2 qj^j 
+ rj ^1 - Vl ~ 4 p + 4p 2 + 4pq - 4p 2 q^J ^ . 

Fixing the parameter q to certain values and varying p we obtain the different cases 
described in Section [5j For this we calculate the scalar product between the quantum 
internal states (A\B) = (A\C) = and (B\C) = yjq. Setting q = all quantum internal 
states are mutually orthogonal and we are in case (i) which is shown in Figure [3] (a). 





0.6 0.8 



0.4 0.6 0.8 




(a) q=0 



(b) q=l 



(c) q=0.7 



Figure 3: Generative complexity Cci, quantum internal entropy C q and excess entropy 
E = I(X; Y) for the RnC process with different g-values. 



For q = 1 the quantum internal states \B) and |C) are identical while |^4) and \B) are 
orthogonal. Thus we are in case (iii) as seen in Figure [3] (b) . For < q < 1 we are in case 
(v) and have a gap between E, C q and Cci as depicted in Figure (c) for q = 0.7. 
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For q = 1 the states \B) and |C) are identical and the corresponding classical HMM is 
not an e-machine but still E = I(X; Y) holds for this model. This shows that the converse 
of Proposition [3] is not true. In the corresponding classical model (Fig. [2] (a)) the states 
B and C can merged to a state BC (see Fig. [2] (b)). This is the classical minimal HMM 
for the underlying process. 

7 Alternative HQMMs 

The induced quantum model ([7]) introduced in Section |1] is not the only possible HQMM 
construction that model a given stochastic process. In this section we present an alter- 
native HQMM construction which is also able to model a stochastic process generated 
by a corresponding classical minimal HMM. For this we follow the construction sug- 
gested in [Monllj . Given a classical minimal HMM (T, (pi, . . . ,p n )) with internal states 
S = {Si, . . . , S n } we define internal states of the quantum model as \i) for i € {1, . . . , n}. 
Furthermore we have pi := and define quantum operations with a sum representa- 

tionS 

n 

}C rP : Y, K r 3 P ( K r J Y > ' V 2 *^' 

for every symbol r G E. With K. T pj = YH=i Pi we 

n n 

P(X = r\Sj) = Tr(JC rPj ) = ^ T ^ = Y. P( - X » = nSi\Sj), 

i=l i=l 

and thus have the same transition probabilities as in the classical minimal HMM. 

The quantum internal state entropy C q of this quantum model always coincide with 
the generative complexity of the process 

C q = S (flPiPij = H (fe}"=i) = Cci- 

In the next example treated in [Monll| we will see that in general I(X; Y) is strictly 
smaller than C q and Proposition H] is not true for this type of HQMM construction. 
Consider the stochastic process generated by a classical 4-symbol HMM (which is minimal 
and coincide with the classical e-machine) with internal states S = {U, D, R, L} and 
transition matrices 



T (o) = 



/ 1/2 









( o 





\ 











, r« = 





1/2 
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(16) 
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Figure 4: Classical 4-symbol HMM denned by equations (|16p. 



Figure H] shows a graphical representation of this HMM. 
We obtain as a stationary distribution 

P(S) = 1(1111), 

and the generative complexity calculates to Cqi = 2. With the framework introduced in 
Section 0] it is possible to calculate I(X; Y) which is the left hand side in (j^J) and amounts 
to I(X;Y) = Since C q = Cqi = 2 Proposition [J] holds not in this situation. The 
quantum internal state entropy of the induced quantum model defined in Section [J] is 
(logarithm is taken to the base 2) 

C q = i(tog(64)+(-3 + 2v^)logQ(3-2V2)) 

'3 + 2^5) log Q(3 + 2v^m 
ps 1.2018. 

Monras et al. suggest in [Monllj another quantum model for this process which is 
only a 2-level quantum system instead of the 4-level quantum system given above. Given 
the internal states | "f), | 4), |+) = and |— ) = and quantum operations 

K, r p = K rP K* r for r G {0, 1, 2, 3} with 

ifl = ^=U)ai, K 3 = ^|-)(-|, 

it can be derived from this HQMM the same statistical behaviour as the classical HMM. 
The quantum internal state entropy of this quantum model is smaller than C q and amounts 
to 

S{p) = 1, 



4 The Stinespring-Kraus Theorem shows that every completely positive map admits a (nonunique) 
operator-sum representation, so that can be written as YZp = J3 i KipK* where Ki are linear operators on 



a Hilbert space, [Kra83 . 
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with P = \\ t)(t I + || I + ll+X+l + ll-X-l- 

This example shows that in general the induced quantum model ([7]) is not the one 
with minimum quantum internal state entropy. The structure of quantum models with 
minimal internal state entropy is an open question. 

Acknowledgment. I would like to thank Andreas Knauf for motivating me to work 
on this topic, for fruitful discussions and for suggestions to improve the text. 
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